{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "pipeline：\n",
    "1. 指定一个notion page（ubuntu的操作命令、conda pip的命令等），通过自定义的Loader加载；\n",
    "2. 分割成13个片段，每个大小为1000个token，两两之间重叠部分为200个token；\n",
    "3. 通过开源的BAAI/bge-m3或bge-large-zh-v1.5，对片段进行向量化，并存储在FAISS（向量相似性搜索库）中；\n",
    "4. 设计RAG prompt和RAG chain；\n",
    "5. 测试GPT在有无RAG情况下的表现。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from notion_client import Client\n",
    "import os\n",
    "from dotenv import load_dotenv\n",
    "load_dotenv() # 在当前路径下新建一个.env文档，其中存储了代理信息\n",
    "\n",
    "NOTION_API_KEY = os.getenv(\"NOTION_API_KEY\")\n",
    "NOTION_PAGE_ID = os.getenv(\"NOTION_PAGE_ID\")\n",
    "\n",
    "notion = Client(auth=NOTION_API_KEY)\n",
    "page_id = NOTION_PAGE_ID\n",
    "page_content = notion.blocks.children.list(block_id=page_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ee04ec75-c67b-460b-9aa0-b2143cead3a2\n"
     ]
    }
   ],
   "source": [
    "print(page_content.get(\"results\")[0].get('parent').get('page_id'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "以一个ubuntu 20.04 和 22.04 命令、操作汇总的notion笔记为例。  \n",
    "notion的page，是由各种block组成：\n",
    "1. 多种 block 类型处理：不同的 block 类型（如列表、段落、折叠块等）需要不同的方式提取文本。\n",
    "2. 嵌套结构：子页面和同步块可能包含更深层次的内容，需要递归获取。\n",
    "\n",
    "例子：\n",
    "1. 第一个block是bulleted_list_item，在notion中是前面带·\n",
    "2. 第二个是paragraph，其rich_text是空，再notion中是空白行\n",
    "3. 第三个是synced_block，是同步块，提供了被同步的块id，需要迭代\n",
    "4. 第四个是toggle，是个折叠块，折叠的内容得通过block id请求notion API的方式获取\n",
    "5. 最后如果页面太大，会分页：has_more、next_cursor\n",
    "\n",
    "```json\n",
    "{'object': 'list',\n",
    " 'results': [{'object': 'block',\n",
    "   'id': 'xxx',\n",
    "   'parent': {'type': 'page_id',\n",
    "    'page_id': 'xxx'},\n",
    "   'created_time': 'xxx',\n",
    "   'last_edited_time': '2025-03-06T13:49:00.000Z',\n",
    "   'created_by': {'object': 'user',\n",
    "    'id': 'xxx'},\n",
    "   'last_edited_by': {'object': 'user',\n",
    "    'id': 'xxx'},\n",
    "   'has_children': False,\n",
    "   'archived': False,\n",
    "   'in_trash': False,\n",
    "   'type': 'bulleted_list_item',\n",
    "   'bulleted_list_item': {'rich_text': [{'type': 'text',\n",
    "      'text': {'content': 'Ubuntu Oracular 24.10', 'link': None},\n",
    "      'annotations': {'bold': False,\n",
    "       'italic': False,\n",
    "       'strikethrough': False,\n",
    "       'underline': False,\n",
    "       'code': False,\n",
    "       'color': 'default'},\n",
    "      'plain_text': 'Ubuntu Oracular 24.10',\n",
    "      'href': None}],\n",
    "    'color': 'default'}},\n",
    "  {'object': 'block',\n",
    "   'id': 'xxx',\n",
    "   'parent': {'type': 'page_id',\n",
    "    'page_id': 'xxx'},\n",
    "   'created_time': 'xxx',\n",
    "   'last_edited_time': 'xxx',\n",
    "   'created_by': {'object': 'user',\n",
    "    'id': 'xxx'},\n",
    "   'last_edited_by': {'object': 'user',\n",
    "    'id': 'xxx'},\n",
    "   'has_children': False,\n",
    "   'archived': False,\n",
    "   'in_trash': False,\n",
    "   'type': 'paragraph',\n",
    "   'paragraph': {'rich_text': [], 'color': 'default'}},\n",
    "  {'object': 'block',\n",
    "   'id': 'xxx',\n",
    "   'parent': {'type': 'page_id',\n",
    "    'page_id': 'xxx'},\n",
    "   'created_time': 'xxx',\n",
    "   'last_edited_time': 'xxx',\n",
    "   'created_by': {'object': 'user',\n",
    "    'id': 'xxx'},\n",
    "   'last_edited_by': {'object': 'user',\n",
    "    'id': 'xxx'},\n",
    "   'has_children': False,\n",
    "   'archived': False,\n",
    "   'in_trash': False,\n",
    "   'type': 'synced_block',\n",
    "   'synced_block': {'synced_from': {'type': 'block_id',\n",
    "     'block_id': 'xxx'}}},\n",
    "  {'object': 'block',\n",
    "   'id': 'xxx',\n",
    "   'parent': {'type': 'page_id',\n",
    "    'page_id': 'xxx'},\n",
    "   'created_time': 'xxx',\n",
    "   'last_edited_time': 'xxx',\n",
    "   'created_by': {'object': 'user',\n",
    "    'id': 'xxx'},\n",
    "   'last_edited_by': {'object': 'user',\n",
    "    'id': 'xxx'},\n",
    "   'has_children': True,\n",
    "   'archived': False,\n",
    "   'in_trash': False,\n",
    "   'type': 'toggle',\n",
    "   'toggle': {'rich_text': [{'type': 'text',\n",
    "      'text': {'content': '笔记本盒盖时行为设置', 'link': None},\n",
    "      'annotations': {'bold': False,\n",
    "       'italic': False,\n",
    "       'strikethrough': False,\n",
    "       'underline': False,\n",
    "       'code': False,\n",
    "       'color': 'default'},\n",
    "      'plain_text': '笔记本盒盖时行为设置',\n",
    "      'href': None}],\n",
    "    'color': 'default'}}],\n",
    " 'next_cursor': 'xxx',\n",
    " 'has_more': True,\n",
    " 'type': 'block',\n",
    " 'block': {},\n",
    " 'request_id': 'xxx'}\n",
    " ```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_all_text(page_id):\n",
    "    text = \"\"\n",
    "    # 根据page_id，从notion api中获取页面\n",
    "    response = notion.blocks.children.list(block_id=page_id)\n",
    "\n",
    "    while True:\n",
    "        for block in response.get(\"results\", []):\n",
    "            # 提取当前 Block 文本\n",
    "            block_type = block[\"type\"]\n",
    "            content = block[block_type]\n",
    "            if \"rich_text\" in content:\n",
    "                for rt in content[\"rich_text\"]:\n",
    "                    text += rt.get(\"text\", {}).get(\"content\", \"\") + \"\\n\"\n",
    "            # 递归处理子 Blocks\n",
    "            if block.get(\"has_children\"):\n",
    "                text += get_all_text(block[\"id\"])\n",
    "        # 处理分页\n",
    "        if not response.get(\"has_more\"):\n",
    "            break\n",
    "        response = notion.blocks.children.list(\n",
    "            block_id=page_id,\n",
    "            start_cursor=response[\"next_cursor\"]\n",
    "        )\n",
    "    return text\n",
    "\n",
    "page_text = get_all_text(page_id)\n",
    "# 2min"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.docstore.document import Document\n",
    "\n",
    "documents = []\n",
    "documents.append(Document(page_content=page_text, metadata={\"source\": page_id}))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
    "\n",
    "text_splitter = RecursiveCharacterTextSplitter(\n",
    "    chunk_size=1000,  # 块大小\n",
    "    chunk_overlap=200  # 块间重叠\n",
    ")\n",
    "splits = text_splitter.split_documents(documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "13\n"
     ]
    }
   ],
   "source": [
    "print(len(splits))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 下载 Embedding 模型，从 HuggingFace 拉取模型至本地\n",
    "\n",
    "# ! cd models/sentence_embedding\n",
    "# ! git clone https://huggingface.co/BAAI/bge-m3\n",
    "# ! git clone https://huggingface.co/BAAI/bge-large-zh\n",
    "# ! git clone https://huggingface.co/BAAI/bge-large-zh-v1.5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/hey/anaconda3/envs/chatbot_temp/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "# https://python.langchain.com/api_reference/huggingface/embeddings/langchain_huggingface.embeddings.huggingface.HuggingFaceEmbeddings.html\n",
    "\n",
    "# from langchain.embeddings import HuggingFaceEmbeddings # 已淘汰\n",
    "from langchain_huggingface import HuggingFaceEmbeddings\n",
    "\n",
    "# 本地：models/sentence_embedding/bge-large-zh、bge-large-zh-v1.5、bge-m3\n",
    "# 从hf获取：BAAI/bge-large-zh、BAAI/bge-large-zh-v1.5、BAAI/bge-m3\n",
    "model_name = \"../models/sentence_embedding/bge-large-zh-v1.5\"\n",
    "model_kwargs = {'device': 'cuda'}\n",
    "# normalize_embeddings （ bool ，可选）– 是否将返回的向量标准化为长度 1。在这种情况下，可以使用更快的点积（util.dot_score）而不是余弦相似度。默认为 False。\n",
    "encode_kwargs = {'normalize_embeddings': False}\n",
    "embeddings = HuggingFaceEmbeddings(\n",
    "    model_name=model_name,\n",
    "    model_kwargs=model_kwargs,\n",
    "    encode_kwargs=encode_kwargs\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1024\n"
     ]
    }
   ],
   "source": [
    "# 测试Embedding生成\n",
    "text = \"test\"\n",
    "vec = embeddings.embed_query(text)\n",
    "print(len(vec))  # 输出向量维度，如 text-embedding-3-small: 1536、bge-large-zh-v1.5: 1024"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.vectorstores import FAISS\n",
    "\n",
    "# 存储到FAISS向量库\n",
    "vectorstore = FAISS.from_documents(splits, embeddings)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "retriever = vectorstore.as_retriever(search_kwargs={\"k\": 5})  # 返回top3结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "\n",
    "template = \"\"\"基于以下上下文回答问题：\n",
    "{context}\n",
    "\n",
    "问题：{question}\n",
    "\"\"\"\n",
    "prompt = ChatPromptTemplate.from_template(template)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "# from langchain_openai import ChatOpenAI\n",
    "\n",
    "# OPENAI_API_KEY = os.getenv(\"OPENAI_API_KEY\")\n",
    "# OPENAI_BASE_URL = os.getenv(\"OPENAI_BASE_URL\")\n",
    "\n",
    "# model = ChatOpenAI(\n",
    "#     api_key=OPENAI_API_KEY,\n",
    "#     base_url=OPENAI_BASE_URL,  # 自定义API地址\n",
    "#     model=\"gpt-4-turbo\"\n",
    "# )\n",
    "\n",
    "from langchain_ollama import ChatOllama\n",
    "\n",
    "# 初始化本地Ollama模型\n",
    "llm = ChatOllama(\n",
    "    model=\"qwen2.5:32b\",\n",
    "    temperature=0.1,\n",
    "    base_url=\"http://localhost:11434\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "from langchain.schema.runnable import RunnablePassthrough\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "\n",
    "# 构建RAG链\n",
    "rag_chain = (\n",
    "    {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
    "    | prompt \n",
    "    | llm \n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "要复制一个conda环境，可以使用`conda create`命令，并加上`--clone`选项。以下是一个例子：\n",
      "\n",
      "假设你有一个名为`py36`的环境，你想基于这个环境创建一个新的副本叫做`py36_bak`，你可以运行如下命令：\n",
      "\n",
      "```bash\n",
      "conda create -n py36_bak --clone py36\n",
      "```\n",
      "\n",
      "这条命令会复制`py36`环境中所有的包和依赖到新创建的名为`py36_bak`的环境里。\n"
     ]
    }
   ],
   "source": [
    "response = rag_chain.invoke(\"如何复制conda环境？\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'context': [Document(id='1c2269c2-e880-425b-8a09-161f0ec7fae4', metadata={'source': 'ee04ec75c67b460b9aa0b2143cead3a2'}, page_content='conda env remove -p 要删除的虚拟环境路径\\n根据environment.yaml安装conda环境\\n比如splatam的代码仓库里给了个environment.yaml\\nname: splatam\\nchannels:\\n  - pytorch\\n  - conda-forge\\n  - defaults\\ndependencies:\\n  - cudatoolkit=11.6\\n  - python=3.10\\n  - pytorch=1.12.1\\n  - torchaudio=0.12.1\\n  - torchvision=0.13.1\\n  - tqdm=4.65.0\\n  - Pillow\\n  - faiss-gpu\\n  - opencv\\n  - imageio\\n  - matplotlib\\n  - kornia\\n  - natsort\\n  - pyyaml\\n  - wandb\\n  - pip\\n  - pip:\\n    - lpips\\n    - open3d==0.16.0\\n    - torchmetrics\\n    - cyclonedds\\n    - git+https://github.com/JonathonLuiten/diff-gaussian-rasterization-w-depth/tree/cb65e4b86bc3bd8ed42174b72a62e8d3a3a71110\\n# 先退出所有环境\\nconda deactivate\\n# 根据environment.yaml安装指定环境\\nconda env create -f environment.yml -v\\n其中channels指定了包的搜索来源，比如cudatoolkit需要conda-forge\\nconda install -c conda-forge cudatoolkit=11.6 # 指定channels，否则defaults channel中是找不到cudatoolkit的\\n如果“conda env create -f environment.yml -v”速度慢，有加速的办法，比如用mamba，更简单的就是升级conda版本\\n$ conda --version\\n\\tconda 23.11.0\\n$ conda env create -f environment.yml -v'), Document(id='dd28dba3-9685-4f01-aa56-47329fbc2fed', metadata={'source': 'ee04ec75c67b460b9aa0b2143cead3a2'}, page_content='$ python test.py \\n4.10.0\\n/home/hey/anaconda3/envs/test/lib/python3.8/site-packages/cv2.cpython-38-x86_64-linux-gnu.so\\n# 验证一下，完成任务了\\nubuntu 2204安装conda\\n下载最新当前最新版本的conda\\nconda config --set auto_activate_base false\\nconda init --reverse $SHELL\\nubuntu 20.04 安装 conda\\n下载最新当前（20230520）最新版本的conda：\\nAnaconda3-2023.03-1-Linux-x86_64.sh\\nhttps://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2023.07-1-Linux-x86_64.sh\\nbash Anaconda3-2021.11-Linux-x86_64.sh\\n安装位置在：/home/hey/anaconda3\\n自动初始化？yes\\n（如果选择yes，则每次打开新的终端时，都会默认进入到base环境）\\n升级conda（解决DEBUG:urllib3.connectionpool的问题）\\n旧版本conda执行安装命令时总是会报DEBUG:urllib3.connectionpool:Starting new HTTPS conne\\n解决办法就是升级conda\\nconda update conda\\n删除指定的conda env\\n# 退出当前环境\\nconda deactivate\\n# 列出当前环境\\nconda env list\\n# 通过环境名来删除指定环境\\nconda remove -n 需要删除的环境名 --all\\n# 通过环境的路径删除指定环境\\nconda env remove -p 要删除的虚拟环境路径\\n根据environment.yaml安装conda环境\\n比如splatam的代码仓库里给了个environment.yaml\\nname: splatam\\nchannels:\\n  - pytorch\\n  - conda-forge\\n  - defaults\\ndependencies:\\n  - cudatoolkit=11.6'), Document(id='632dd147-cffb-4d5c-ba25-12fe19581070', metadata={'source': 'ee04ec75c67b460b9aa0b2143cead3a2'}, page_content='通过environment.yaml环境文件创建文件： conda env create -f environment.yaml\\n查看已安装的包：conda list\\n搜索包：\\n模糊搜索：conda search <package_name1>\\nconda list | findstr torch\\nconda search --full-name <package name> 搜索指定的包\\n比如 conda search --full-name tensorflow 显示所有的tensorflow包。\\n安装包：conda install <package_name1> <package_name2>\\n卸载包：\\nconda remove <package_name>\\nconda remove --name [env_name] --all，回车后出现一系列在这个环境所安装的包；输入【y】进行环境的删除。\\nconda uninstall package_name(包名)\\ncopy环境：\\nconda create\\n --\\nname 新环境名\\n --\\nclone 旧环境名\\n检查更新当前的conda版本\\nconda update conda\\n参考\\nv2calibration代码环境配置（conda）\\nwin10 conda环境中conda无法安装opencv-python，以及pip失效问题\\n在win的cmd中安装opencv-python和opencv-contrib-python可真气死我了\\n进入conda虚拟环境\\nconda search opencv-python\\n无，查看一下源，有清华源，没问题。但conda里是有opencv的\\n这个能用不？\\npip search opencv-python\\n懵逼，，\\n继续懵逼\\n翻不翻墙都不行\\n谷歌了一下，有两个解决办法\\n临时办法：\\npip install opencv-python==3.4.15.55 -i \\nhttp://pypi.douban.com/simple\\n --trusted-host \\npypi.douban.com\\n临时走一下豆瓣的源\\n永久办法：\\nref：\\n[global]\\nindex-url = https://pypi.tuna.tsinghua.edu.cn/simple\\n[install]'), Document(id='ee3c4b11-c889-413e-bfcb-48931ad44b8a', metadata={'source': 'ee04ec75c67b460b9aa0b2143cead3a2'}, page_content='conda config --set proxy_servers.https https://proxy_server:proxy_port\\npip常用命令\\npip list 查看以及安装的包和版本号\\npip --version：查看已经安装了的pip版本\\npip install -U pip：升级pip\\npip list 或 pip freeze：查看当前已经安装好了包及版本\\npip install package_name(包名)：下载安装包\\npip uninstall package_name(包名)： 卸载安装包\\npip show package_name(包名)：显示安装包信息（安装路径、依赖关系等）\\nconda常用命令 \\n创建环境：conda create -n <env_name> <packages>\\n基于python3.6创建一个名为py36的环境\\nconda create --name py36 python=3.6\\n激活环境：conda activate <env_name>\\n退出环境：conda deactivate <env_name>\\n查看已安装的环境信息：\\nconda env list\\nconda info -e \\n$ pip list\\n复制环境：conda create -n <new_env_name> --clone <origin_env_name>\\n通过克隆py36来创建一个称为py36_bak的副本：\\nconda create -n py36_bak --clone py36\\n删除环境：\\nconda env remove -n <env_name>\\n保存环境信息到environment.yaml文件中：conda env export > environment.yaml\\n通过environment.yaml环境文件创建文件： conda env create -f environment.yaml\\n查看已安装的包：conda list\\n搜索包：\\n模糊搜索：conda search <package_name1>\\nconda list | findstr torch\\nconda search --full-name <package name> 搜索指定的包'), Document(id='a97584d4-78f1-4ee0-9b80-9aab91d7d02c', metadata={'source': 'ee04ec75c67b460b9aa0b2143cead3a2'}, page_content='如果“conda env create -f environment.yml -v”速度慢，有加速的办法，比如用mamba，更简单的就是升级conda版本\\n$ conda --version\\n\\tconda 23.11.0\\n$ conda env create -f environment.yml -v\\n\\tChannels:\\n\\t - pytorch\\n\\t - conda-forge\\n\\t - defaults\\n\\tPlatform: linux-64\\n\\tCollecting package metadata (repodata.json): ...working... info     libmamba Reading cache files \\'/tmp/tmpv4iityir.json.*\\' for repo index \\'installed\\'\\n\\tinfo     \\nlibmamba\\n Reading repodata.json file \"/tmp/tmpv4iityir.json\" for repo installed\\n\\tinfo     \\nlibmamba\\n Reading cache files \\'/home/hey/anaconda3/pkgs/cache/ee0ed9e9.json.*\\' for repo index \\'https://conda.anaconda.org/pytorch/linux-64\\'\\n\\t...\\n23.11.0 已经自动用上manba了\\n删除conda环境中的指定包\\nconda remove --name $your_env_name $package_name\\npip 换国内源（加速）\\n用国外源有两个缺点：费流量、shell中让conda pip走代理，还得设置\\npip 临时走一下国内源：（有时某个源找不到这个包或下载速度慢，就换另一个源）\\n# 清华源\\npip install opencv-python==3.4.15.55 -i https://pypi.tuna.tsinghua.edu.cn/simple\\n# 豆瓣源\\npip install opencv-python==3.4.15.55 -i \\nhttp://pypi.douban.com/simple\\n --trusted-host \\npypi.douban.com')], 'question': '如何复制conda环境？'}\n"
     ]
    }
   ],
   "source": [
    "from langchain.schema.runnable import RunnablePassthrough\n",
    "from langchain_core.runnables import RunnableParallel\n",
    "\n",
    "partial_chain = RunnableParallel( \n",
    "    {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
    ")\n",
    "response = partial_chain.invoke(\"如何复制conda环境？\")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```json\n",
    "{\n",
    "    'context': [\n",
    "        Document(\n",
    "            id='1c2269c2-e880-425b-8a09-161f0ec7fae4', \n",
    "            metadata={'source': 'ee04ec75c67b460b9aa0b2143cead3a2'}, \n",
    "            page_content='conda env remove -p 要删除的虚拟环境路径\\n根据environment.yaml安装conda环境\\n比如splatam的代码仓库里给了个environment.yaml\\nname: splatam\\nchannels:\\n  - pytorch\\n  - conda-forge\\n  - defaults\\ndependencies:\\n  - cudatoolkit=11.6\\n  - python=3.10\\n  - pytorch=1.12.1\\n  - torchaudio=0.12.1\\n  - torchvision=0.13.1\\n  - tqdm=4.65.0\\n  - Pillow\\n  - faiss-gpu\\n  - opencv\\n  - imageio\\n  - matplotlib\\n  - kornia\\n  - natsort\\n  - pyyaml\\n  - wandb\\n  - pip\\n  - pip:\\n    - lpips\\n    - open3d==0.16.0\\n    - torchmetrics\\n    - cyclonedds\\n    - git+https://github.com/JonathonLuiten/diff-gaussian-rasterization-w-depth/tree/cb65e4b86bc3bd8ed42174b72a62e8d3a3a71110\\n# 先退出所有环境\\nconda deactivate\\n# 根据environment.yaml安装指定环境\\nconda env create -f environment.yml -v\\n其中channels指定了包的搜索来源，比如cudatoolkit需要conda-forge\\nconda install -c conda-forge cudatoolkit=11.6 # 指定channels，否则defaults channel中是找不到cudatoolkit的\\n如果“conda env create -f environment.yml -v”速度慢，有加速的办法，比如用mamba，更简单的就是升级conda版本\\n$ conda --version\\n\\tconda 23.11.0\\n$ conda env create -f environment.yml -v'), \n",
    "        Document(\n",
    "            id='dd28dba3-9685-4f01-aa56-47329fbc2fed', \n",
    "            metadata={'source': 'ee04ec75c67b460b9aa0b2143cead3a2'}, \n",
    "            page_content='$ python test.py \\n4.10.0\\n/home/hey/anaconda3/envs/test/lib/python3.8/site-packages/cv2.cpython-38-x86_64-linux-gnu.so\\n# 验证一下，完成任务了\\nubuntu 2204安装conda\\n下载最新当前最新版本的conda\\nconda config --set auto_activate_base false\\nconda init --reverse $SHELL\\nubuntu 20.04 安装 conda\\n下载最新当前（20230520）最新版本的conda：\\nAnaconda3-2023.03-1-Linux-x86_64.sh\\nhttps://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2023.07-1-Linux-x86_64.sh\\nbash Anaconda3-2021.11-Linux-x86_64.sh\\n安装位置在：/home/hey/anaconda3\\n自动初始化？yes\\n（如果选择yes，则每次打开新的终端时，都会默认进入到base环境）\\n升级conda（解决DEBUG:urllib3.connectionpool的问题）\\n旧版本conda执行安装命令时总是会报DEBUG:urllib3.connectionpool:Starting new HTTPS conne\\n解决办法就是升级conda\\nconda update conda\\n删除指定的conda env\\n# 退出当前环境\\nconda deactivate\\n# 列出当前环境\\nconda env list\\n# 通过环境名来删除指定环境\\nconda remove -n 需要删除的环境名 --all\\n# 通过环境的路径删除指定环境\\nconda env remove -p 要删除的虚拟环境路径\\n根据environment.yaml安装conda环境\\n比如splatam的代码仓库里给了个environment.yaml\\nname: splatam\\nchannels:\\n  - pytorch\\n  - conda-forge\\n  - defaults\\ndependencies:\\n  - cudatoolkit=11.6'), \n",
    "        Document(\n",
    "            id='632dd147-cffb-4d5c-ba25-12fe19581070', \n",
    "            metadata={'source': 'ee04ec75c67b460b9aa0b2143cead3a2'}, \n",
    "            page_content='通过environment.yaml环境文件创建文件： conda env create -f environment.yaml\\n查看已安装的包：conda list\\n搜索包：\\n模糊搜索：conda search <package_name1>\\nconda list | findstr torch\\nconda search --full-name <package name> 搜索指定的包\\n比如 conda search --full-name tensorflow 显示所有的tensorflow包。\\n安装包：conda install <package_name1> <package_name2>\\n卸载包：\\nconda remove <package_name>\\nconda remove --name [env_name] --all，回车后出现一系列在这个环境所安装的包；输入【y】进行环境的删除。\\nconda uninstall package_name(包名)\\ncopy环境：\\nconda create\\n --\\nname 新环境名\\n --\\nclone 旧环境名\\n检查更新当前的conda版本\\nconda update conda\\n参考\\nv2calibration代码环境配置（conda）\\nwin10 conda环境中conda无法安装opencv-python，以及pip失效问题\\n在win的cmd中安装opencv-python和opencv-contrib-python可真气死我了\\n进入conda虚拟环境\\nconda search opencv-python\\n无，查看一下源，有清华源，没问题。但conda里是有opencv的\\n这个能用不？\\npip search opencv-python\\n懵逼，，\\n继续懵逼\\n翻不翻墙都不行\\n谷歌了一下，有两个解决办法\\n临时办法：\\npip install opencv-python==3.4.15.55 -i \\nhttp://pypi.douban.com/simple\\n --trusted-host \\npypi.douban.com\\n临时走一下豆瓣的源\\n永久办法：\\nref：\\n[global]\\nindex-url = https://pypi.tuna.tsinghua.edu.cn/simple\\n[install]'), \n",
    "        Document(\n",
    "            id='ee3c4b11-c889-413e-bfcb-48931ad44b8a', \n",
    "            metadata={'source': 'ee04ec75c67b460b9aa0b2143cead3a2'}, \n",
    "            page_content='conda config --set proxy_servers.https https://proxy_server:proxy_port\\npip常用命令\\npip list 查看以及安装的包和版本号\\npip --version：查看已经安装了的pip版本\\npip install -U pip：升级pip\\npip list 或 pip freeze：查看当前已经安装好了包及版本\\npip install package_name(包名)：下载安装包\\npip uninstall package_name(包名)： 卸载安装包\\npip show package_name(包名)：显示安装包信息（安装路径、依赖关系等）\\nconda常用命令 \\n创建环境：conda create -n <env_name> <packages>\\n基于python3.6创建一个名为py36的环境\\nconda create --name py36 python=3.6\\n激活环境：conda activate <env_name>\\n退出环境：conda deactivate <env_name>\\n查看已安装的环境信息：\\nconda env list\\nconda info -e \\n$ pip list\\n复制环境：conda create -n <new_env_name> --clone <origin_env_name>\\n通过克隆py36来创建一个称为py36_bak的副本：\\nconda create -n py36_bak --clone py36\\n删除环境：\\nconda env remove -n <env_name>\\n保存环境信息到environment.yaml文件中：conda env export > environment.yaml\\n通过environment.yaml环境文件创建文件： conda env create -f environment.yaml\\n查看已安装的包：conda list\\n搜索包：\\n模糊搜索：conda search <package_name1>\\nconda list | findstr torch\\nconda search --full-name <package name> 搜索指定的包'), \n",
    "        Document(\n",
    "            id='a97584d4-78f1-4ee0-9b80-9aab91d7d02c', \n",
    "            metadata={'source': 'ee04ec75c67b460b9aa0b2143cead3a2'}, \n",
    "            page_content='如果“conda env create -f environment.yml -v”速度慢，有加速的办法，比如用mamba，更简单的就是升级conda版本\\n$ conda --version\\n\\tconda 23.11.0\\n$ conda env create -f environment.yml -v\\n\\tChannels:\\n\\t - pytorch\\n\\t - conda-forge\\n\\t - defaults\\n\\tPlatform: linux-64\\n\\tCollecting package metadata (repodata.json): ...working... info     libmamba Reading cache files \\'/tmp/tmpv4iityir.json.*\\' for repo index \\'installed\\'\\n\\tinfo     \\nlibmamba\\n Reading repodata.json file \"/tmp/tmpv4iityir.json\" for repo installed\\n\\tinfo     \\nlibmamba\\n Reading cache files \\'/home/hey/anaconda3/pkgs/cache/ee0ed9e9.json.*\\' for repo index \\'https://conda.anaconda.org/pytorch/linux-64\\'\\n\\t...\\n23.11.0 已经自动用上manba了\\n删除conda环境中的指定包\\nconda remove --name $your_env_name $package_name\\npip 换国内源（加速）\\n用国外源有两个缺点：费流量、shell中让conda pip走代理，还得设置\\npip 临时走一下国内源：（有时某个源找不到这个包或下载速度慢，就换另一个源）\\n# 清华源\\npip install opencv-python==3.4.15.55 -i https://pypi.tuna.tsinghua.edu.cn/simple\\n# 豆瓣源\\npip install opencv-python==3.4.15.55 -i \\nhttp://pypi.douban.com/simple\\n --trusted-host \\npypi.douban.com')\n",
    "        ], \n",
    "    'question': '如何复制conda环境？'\n",
    "}\n",
    "```\n",
    "\n",
    "document3和4都命中了question"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "在使用 Conda 管理 Python 开发环境时，有时需要复制一个已有的环境以保持一致性或进行实验。以下是复制 Conda 环境的步骤：\n",
      "\n",
      "1. **列出所有环境**：首先可以查看当前所有的 Conda 环境。\n",
      "   ```bash\n",
      "   conda env list\n",
      "   ```\n",
      "   或者使用简写形式：\n",
      "   ```bash\n",
      "   conda info --envs\n",
      "   ```\n",
      "\n",
      "2. **导出环境配置文件**：选择一个要复制的环境，然后将其包列表导出到一个 YAML 文件中。假设你要复制名为 `my_env` 的环境。\n",
      "   ```bash\n",
      "   conda env export --name my_env > environment.yml\n",
      "   ```\n",
      "   这个命令会生成一个 `environment.yml` 文件，其中包含了该环境中所有已安装的包及其版本信息。\n",
      "\n",
      "3. **创建新环境**：使用导出的 YAML 文件来创建一个新的 Conda 环境。假设你想将新的环境命名为 `my_env_copy`。\n",
      "   ```bash\n",
      "   conda env create --name my_env_copy --file environment.yml\n",
      "   ```\n",
      "\n",
      "4. **验证新环境**：你可以通过激活新环境并列出其包来检查是否成功复制了原环境。\n",
      "   ```bash\n",
      "   conda activate my_env_copy\n",
      "   conda list\n",
      "   ```\n",
      "\n",
      "以上步骤将帮助你完整地复制一个 Conda 环境。注意，如果原环境中包含一些非 PyPI 源的包（例如从 Anaconda 仓库），确保你的新环境也能访问这些源。\n",
      "\n",
      "另外，如果你只需要快速复制环境而不关心特定版本号，可以使用 `--from-history` 参数来导出环境配置文件：\n",
      "```bash\n",
      "conda env export --name my_env --from-history > environment.yml\n",
      "```\n",
      "这种方式会忽略一些非必需的包依赖关系，只保留那些通过显式命令安装的包。\n"
     ]
    }
   ],
   "source": [
    "# 无参考资料时，gpt的回复：\n",
    "\n",
    "response = llm.invoke(\"如何复制conda环境？\")\n",
    "print(response.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "总结，对于复制conda环境的问题，rag和非rag给了不同的角度的回答，rag的回答是clone，非rag回答是export后create，虽然笔记里这两种的命令都给了。也许应该测试有时效性或者个人问题，而非通用问题。\n",
    "<img src=\"./assets/截图 2025-03-13 15-41-06.png\" alt=\"运行截图\" width=\"30%\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "chatbot",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
